Associative Document Search using a Probabilistic Document Clustering
نویسندگان
چکیده
منابع مشابه
Large - scale Document Clustering for Associative Document Search
Approximated algorithms for clustering large-scale document collection are proposed and evaluated under the context of cluster-based document retrieval (i.e., associative document search). These algorithms use a precise clustering algorithm as a subroutine to construct a strati ed structure of cluster trees. An experiment showed that more than 100 times speedup in cpu time was gained at best. T...
متن کاملFaster Exact Search Using Document Clustering
We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD – Search with Clustered Documents). We develop a fast multilevel clustering algorithm that explicitly uses query cost for conjunctive queries as an objective function. Depending on the inputs we get up to four times faster than non-clustered search. The resulting ...
متن کاملDocument Clustering for Distributed Fulltext Search
Recent research efforts in peer-to-peer (P2P) systems concentrate on providing a “distributed hash table”-like primitive in the P2P system (Stoica et al., 2001). However, to make P2P systems useful, we need to build a keyword search engine to index the entire document collection in the distributed system. Doing keyword search in a distributed environment poses new challenges for traditional inf...
متن کاملImproving Text Search Process using Text Document Clustering Approach
Knowledge discovery and data mining is a process of retrieving the meaningful knowledge from the raw data, using different techniques. Therefore, text mining is a sub domain of knowledge discovery from the text data. This paper provides a different way of understanding the text mining and their applications in different real time applications. This paper also includes the design of a hybrid tex...
متن کاملOnline Document Clustering Using GPUs
An algorithm for performing online clustering on the GPU is proposed which makes heavy use of the atomic operations available on the GPU. The algorithm can cluster multiple documents in parallel in way that can saturate all the parallel threads on the GPU. The algorithm takes advantage of atomic operations available on the GPU in order to cluster multiple documents at the same time. The algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 1998
ISSN: 1340-7619,2185-8314
DOI: 10.5715/jnlp.5.101